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Abstract 

The human connectome represents a network map of the brain’s wiring dia¬ 
gram and the pattern into which its connections are organized is thought to 
play an important role in cognitive function. The generative rules that shape 
the topology of the human connectome remain incompletely understood. Ear¬ 
lier work in model organisms has suggested that wiring rules based on geomet¬ 
ric relationships (distance) can account for many but likely not all topological 
features. Here we systematically explore a family of generative models of the 
human connectome that yield synthetic networks designed according to different 
wiring rules combining geometric and a broad range of topological factors. We 
find that a combination of geometric constraints with a homophilic attachment 
mechanism can create synthetic networks that closely match many topological 
characteristics of individual human connectomes, including features that were 
not included in the optimization of the generative model itself. We use these 
models to investigate a lifespan dataset and show that, with age, the model 
parameters undergo progressive changes, suggesting a rebalancing of the gener¬ 
ative factors underlying the connectome across the lifespan. 
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1. Introduction 


The human connectome represents a network map of the brain in which 
regions and inter-regional connections are rendered into the nodes and edges 
of a graph. In this format, the connectome can be analyzed using tools from 
network science and graph theory ( ]Bullmore and Sporns 2009 Sporns, 2014). 
Network analyses of the connectome have revealed a host of attributes that are 
likely essential for healthy brain function, including hierarchical and multi-scale 


modules (Bassett et al. 2010 Betzel et ah, 2013), highly connected, highly 
central hubs ([Hagmann et al. 2008 van den Heuvel and Sporns, 2013), and 


a rich club of mutually connected, high-degree regions (van den Heuvel and 


Sporns, 2011). Additionally, the connectome’s topology (the pattern in which 


its connections are configured) is thought to play an important role in shaping 


task-evoked and spontaneous brain activity (Hermundstad et al. 2013 Gohi 


et al. 

2014 

Misic et al. 

2015 


The connectome is an example of a physical network whose nodes and edges 
are embedded in Euclidean space ( jBarthelemy 2011). Consequently, the for¬ 
mation of connections carries a material and metabolic cost that increases with 


connection length (Bullmore and Sporns 2012). To remain within the limits of 
viability, the human connectome maintains disproportionally many short-range 
(i.e. low cost) connections. Despite the importance of conserving connection 
cost, previous work in model organisms has shown that wiring minimization 


alone cannot account for all the connectome’s topological features (Kaiser and 


Hilgetag 2006 Costa et al. 2007aI. Rather, connectome networks strike a bal¬ 


ance wherein the formation of costly features like hubs and rich clubs trades off 
with a drive to reduce the total cost of wiring. 

The conditions that allow this tradeoff to emerge are the central topic of 
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this paper, and one that we explore using generative models applied to human 
connectome data obtained from individual participants. In the context of com¬ 
plex networks, generative modeling refers to a set of approaches for creating 
synthetic networks with properties similar to those of real-world networks. One 


example among many (Watts and Strogatz 1998 Kumar et al. 2000 Sole et al. 


2002 Vazquez et al. 2003 Dali and Christensen 2002 Middendorf et al., 20051 


is the preferential attachment model ( [Barabasi and Albert 1999), which gener¬ 
ates synthetic networks with heavy-tailed degree distributions similar to those 
observed in many real-world socio-technical networks. 

In this report we build upon and expand the tradition of generative models 
for brain networks by fitting many different generative models to single-subject 
human connectome data and comparing models in terms of their overall per¬ 
formance. The models we investigate combine two distinct mechanisms for 
network growth: 1) geometric wiring rules which influence connection forma¬ 
tion by favoring either shorter or longer connections and 2) non-geometric rules 
that ignore the distance between two regions and, instead, form connections 
on the basis of some shared topological relationship. Some of the models we 
consider implement rules that mimic well-established growth mechanisms like 
geometric random graphs, preferential attachment, degree assortativity, and ho- 
mophilic attraction. In all cases, our aim is to discover wiring rules that produce 
synthetic networks with properties similar to those of observed connectomes. 

To this end, we tuned our models’ parameters to generate realistic synthetic 
networks. We found that the best-fitting model was one that penalized the 
formation of longer connections while increasing the likelihood of forming con¬ 
nections between brain regions with similar connectivity profiles (homophily). 
We cross-validated this result, comparing synthetic and observed connectomes 
along measures other than those used in the optimization process and using 


3 



































three different datasets. Finally, we fit the optimal generative model to data 
from a lifespan study (with ages ranging from 7-85 years) and found that the 
penalty on long-distance connections weakened monotonically with age. Older 
subjects’ connectomes were fit poorly compared to those of younger individuals, 
a result driven primarily by an inability to match edge length and clustering 
coefficient distributions. This suggests that the human connectome undergoes 
a characteristic reorganization across the lifespan. 


2. Methods 


2.1. Data acquisition and processing 

A total of = 40 healthy participants underwent an MRI session on a 3-T 
Siemens Trio scanner with a 32-channel head-coil. The magnetization-prepared 
rapid graident-echo (MPRAGE) sequence was 1 mm in-plane resolution and 1.2 
mm slice thickness. The DSI sequence included 128 diffusion-weighted volumes 
plus one reference bo volume, maximum 6-value of 8000 s ■ mm~^ and 2.2 x 2.2 x 
3.0 mm voxel size. The echo planar imaging (EPl) sequence was 3.3 mm in-plane 
resolution and 0.3 mm slice thickness with TR of 1920 ms. DSI and MPRAGE 


data were processing using the Gonnectome Mapping Toolkit (Daducci et al. 


20121. Segmentation of grey and white matter was based on MPRAGE volumes. 


The cerebral cortex was parcellated into n = 219 ROls (Cammoun et al. 2012) 


of which we retained the 108 comprising the right hemisphere. We enforced an 
average connectome density of p « 10%, resulting in a streamline threshold of 27 
streamlines (i.e. a minimum of 27 streamlines must have connected two regions 
for us to consider the presence of an anatomical connection). These same data 


have been analyzed elsewhere (Avena-Koenigsberger et al. 2014 Gohi et al. 


2014 Betzel et al. 2013). 
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2.2. Generative algorithm 

In this report we generate synthetic networks using a generative model. 
The algorithm for producing synthetic networks is simple. Starting with a 
sparse seed network (62 bi-directional edges that were common across all 40 
participants), edges were added one at a time over a series of steps until M total 
connections were placed (where M = 576 ±57 connections across subjects). At 
each step we allow for the possibility that any pair of unconnected nodes, u 
and V, will be connected. Connections are formed probabilistically, where the 
relative probability of connection formation is given by: 

P{u,v) = E{u,v)^ X K{u,v)'^ (1) 

In this expression E{u, v) denotes the Euclidean distance between brain re¬ 
gions u and V. The exponent rj controls the characteristic connection length. 
When rj < 0, short-range connections are favored, while ry > 0 increases the 
probability of forming longer connections. The other term, K(u,v), represents 
an arbitrary non-geometric relationship between nodes u and v and the value 
of 7 scales its relative importance. The precise definition of K{u,v) is flexi¬ 
ble and can be varied to realize different wiring rules. For instance, setting 
K{u,v) = kukv and 7 > 0 implements a variant of preferential attachment, 
wherein higher degree nodes are more likely to become connected. Alterna¬ 
tive definitions can be used to implement rules such as degree assortativity (e.g. 
K{u,v) = \ku — ky\, where nodes with similar/dissimilar numbers of connections 
preferentially connect to one another) or homophily (e.g. K{u, v) = ^uwO-wv 
where connections form between nodes with more or fewer common neighbors). 
In Table[^we show a complete list of all non-geometric wiring rules. We limit our 
analysis to generative models whose wiring rules include only two components, 
though we could accommodate more components, in principle. We impose this 
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limit in an effort to focus on highly simple, idealized models of network growth. 

To prevent cases where P{u,v) is undefined (e.g. if K{u,v) = 0 and 7 < 0 
then P{u,v) = 00 , we add e = 10“® to each K{u,v) before raising it to the 
power, 7 . Over the course of the generative process new edges are added to the 
synthetic network which necessarily changes the value of K{u,v). Accordingly, 
at each step we update K{u,v) and the corresponding changes to P{u,v). If, 
at any step, the edge {u, u} is added to the synthetic network, then P{u, v) = 0 
for all subsequent steps. See Figure [ST^ for an illustration of the model using a 
toy network model. 

In our model we use Euclidean distance as a proxy for the cost of the connec¬ 
tion between brain regions u and v. It is worth noting that there are alternative 
measures for quantifying the cost or spatial relatedness of node pairs, includ¬ 


ing measures derived from the network’s spatial embedding (Friedman et al. 


20151. Another candidate measure of, perhaps, greater neurobiological interest 


is fiber length, which measures the actual curved trajectories of white-matter 
tracts rather than the straight-line (Euclidean) distance between brain region 
centroids. While Euclidean distance and fiber length are correlated with one 
another, there are many instances where the fiber length of a connection is much 
longer than what would be expected given Euclidean distance. A more detailed 


discussion of this topic can be found in the Appendeix (Figures SIO and Sll). 


2.3. Evaluating synthetic network fitness 

To assess the fitness of a synthetic network we calculated its energy, which 
measures how dissimilar a synthetic network is to the observed connectome. 
Intuitively, if the two networks have many properties in common, then the 
synthetic network’s energy is small. Specifically, a synthetic network’s energy 
was defined as: 

E = max{KSk,KSc,KSb,KSe) (2) 
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where the arguments are Kolmogorov-Smirnov statistics which quantify the dis¬ 
crepancy between the synthetic and observed connectomes in terms of their 
degree (k), clustering (c), betweenness centrality (5), and edge length (e) distri¬ 
butions. Here, edge length refers to the Euclidean distance between the centroids 
of two connected brain regions. By taking the maximum of the four statistics 
we consider a synthetic network to be only as fit as its greatest discrepancy. 

2.4- Model optimization 

Given the generative rule and the energy measure for evaluating a model 
network’s goodness of ht, it was important to find the parameters { 77 , 7 } that 
produced networks with the lowest possible energy values. To solve this opti¬ 
mization problem, we developed a simple procedure based on classical Monte 
Carlo methods. The procedure consisted of three stages that were repeated: 

1. A sampling stage in which points in parameter space are selected 

2. An evaluation stage, where synthetic networks are generated with the 
previously-selected parameter values and their energies calculated. 

3. A partitioning stage, in which the entire parameter space is partitioned 
according to a Voronoi tessellation. 

The procedure is initialized in stage 1 by randomly sampling Ngamp = 2000 
points from parameter space. After evaluating the energy at each point and 
partitioning the entire parameter space into Voronoi cells, the algorithm returns 
to stage 1. Rather than sample points randomly, points are now sampled from 
within the boundaries of Voronoi cells, where the probability of drawing a point 
from within any given cell is inversely proportional to that cell’s energy {P{C) oc 
where Ec is the energy of Voronoi cell, C, and P{C) is the relative 
probability of sampling from within that cell). This procedure ensures that 
points are sampled preferentially from low-energy regions of parameter space. 
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We repeated stages 1, 2, and 3 a total of five times and varied a with each 
repetition, going from a = {0.0,0.5,1.0,1.5, 2.0}. Early on, the low values 
of a meant that we searched the parameter space randomly, while the larger 
values at later repetitions allowed us to focus in on the low energy regions. 
We emphasize that alternative optimization schemes could be used to minimize 
E (e.g. simulated annealing); the approach used here was chosen because it 
allowed us to not only converge to good solutions, but also to explore the energy 
landscape. 


3. Results 


We fit generative models to the connectomes of individual participants. In 
the main text, we focus on 40 adults (ages 18-40 years) scanned at the De¬ 
partment of Radiology, University Hospital Center and University of Lausanne 
(CHUV), Lausanne, Switzerland. The Appendix contains results from replica¬ 
tion cohorts of 214 and 126 participants from the Human Connectome Project 
(HCP) (Van Essen et al. 2012[ Glasser et ah, 20131 and the Nathan Kline In¬ 


stitute, Rockland, New York (NKI) cohort (Nooner et al. 2012), respectively. 
In the same Appendix we also investigate the sensitivity of our results to alter¬ 
native processing streams. 


3.1. Geometric model 

It is well known that the connectome’s physical embedding shapes its topol¬ 


ogy by promoting the formation of low-cost connections (Bullmore and Sporns 


2012). On the other hand, forming only the shortest connections produces a 


skewed edge length distribution lacking long-distance connections (Kaiser and 


Hilgetag 20061, resulting in increased characteristic path length, thereby reduc¬ 


ing the efficiency with which information can flow between distant brain regions. 

















We first sought to test the extent to which cost conservation shapes the topol¬ 
ogy of the human connectome by implementing a pure geometric model (i.e. 
K{u,v) = 1). 

For each participant we tuned the free parameter, 77 , to a range where the 
geometric model consistently produced synthetic networks with near-minimal 
energies (Figure]^) and analyzed the top 1 % lowest-energy synthetic networks 
(100 networks/participant). At this point in parameter space (77 = —4.01 ± 
0.31; sample meanistandard error; see Figur^^), synthetic networks achieved 
an average energy oi E = 0.29 ± 0.02 with KS statistics KSk = 0.15 ± 0.03, 
KSb = 0.18 ±0.04, KSe = 0.27 ±0.03, and KS^ = 0.29 ± 0.02 (Figurej^). To 
contextualize these scores, we compared them to KS statistics obtained from a 
null generative model where connections were formed with uniform probability. 
We found that, with the exception of KSe {p « 0.4; Wilcoxon signed-rank test 


(Wilcoxon, 19451), the geometric model produced significantly lower energy and 
smaller KS statistics (maximum p « 10“®). 

Interestingly, the point at which energy is minimized deviates from the re¬ 
spective minima of KSe and KSc, demonstrating that even the-best fitting 
synthetic networks generated by the geometric model cannot simultaneously 
match observed connectomes in terms of clustering and edge length distribu¬ 
tions. The reason for this is intuitive: A strong distance penalty is required to 
generate highly clustered networks, which inadvertently penalizes the formation 
of long-distance connections. Conversely, realistic edge length distributions arise 
when the distance penalty is relatively weak, at which point synthetic networks 
become vastly under-clustered. The energy minimum occurs at a point situated 
between these two extremes, trading off accuracy along one dimension with the 
other though never simultaneously minimizing both (Figure]^). 
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3.2. Models driven hy geometry and topology outperform pure geometric models 

The failure of the pure geometric model to generate synthetic networks that 
were as clustered and contained as many long-distance connections as observed 
connectomes suggests that additional factors are needed as part of a realis¬ 
tic generative mechanism. To determine which factors were most capable in 
this regard we compared twelve different generative models where topological 
features such as degree, clustering, and homophily influenced the connection 
formation probabilities. As expected, due to the additional free parameter, 7 , 
we find that all dual-factor models outperformed the pure geometric model, gen¬ 
erating synthetic networks with significantly lower energies {p « 0, see Figure 
[^. Importantly, dual-factor models were stratified, with clustering-based mod¬ 
els outperforming degree-based models, which in turn were outperformed by 
homophily-based models. The absolute best model incorporated a homophilic 
attraction mechanism in the form of the matching index (MI), which is a nor¬ 
malized measure of overlap in two nodes’ neighborhoods. If T^ = {n : Ouv = 1} 
represents the set of node u’s neighbors, then the matching index is equal to: 


u I 


(3) 


where is simply r„ but with v excluded from the set. In the event that 
u and V have perfect overlap in their neighborhoods, then = 1. If the 
neighborhoods contain no common elements then Muv = 0 . 

Applied to the CHUV dataset, the MI model achieved an average energy 
oi E = 0.12 ± 0.02 with parameters rj = —0.98 ± 0.37 and 7 = 0.42 ± 0.04 
(Figure [^). Together, these parameter values indicated that, like the pure 
geometric model, the MI model exercised a penalty against long distance con¬ 
nections (albeit markedly weaker than the geometric model), while increasing 
the probability that nodes with similar connectivity profiles would connect to 
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one another. Interestingly, the parameters rj and 7 appear to trade off with 
one another (Figure]^), suggesting that the more an individual’s connectome 
is shaped by geometry (large amplitude of ry), the less it is shaped by non¬ 
geometric constraints and vice versa. On average, the MI model outperformed 
the geometric model in reducing discrepancies along all four components of the 
energy function: KSk = 0.10 ± 0.03, KSb = 0.10 ± 0.02, KS^ = 0.10 ± 0.03 
and KSc = 0.11 ± 0.02 (maximum p-value for all KS statistics and energy was 
p « 10“^, Wilcoxon signed-rank test). Whereas the geometric model’s perfor¬ 
mance was limited primarily by mismatches in clustering and edge length, the 
Ml model’s performance was more evenly limited. The best-fitting synthetic 
networks had energies equal to iFS'fe, KSc, and KSc around 21%, 25%, 

29%, and 25% of the time, respectively. 

3.3. Evaluating synthetic networks using additional measures 

Our analyses to this point consisted of tuning the parameters of generative 
models to ranges where the synthetic networks achieved low energy, which iden¬ 
tified the MI model as the best fitting model. The form of the energy function, 
however, may be considered ad hoc; it represents only one of many alternative 
ways to evaluate a synthetic network’s fitness. For this reason it was important 
to establish that the best-fitting synthetic networks generated by the MI model 
matched observed connectomes across additional dimensions that were not part 
of the energy function used for optimization. To that end, we subjected the 
lowest-energy synthetic networks to a series of additional tests to determine 
whether they could also reproduce other properties of the human connectome. 

3.3.1. Graph theoretic measures 

The first test involved evaluating the best-fitting synthetic networks in terms 
of how well they matched graph-theoretical properties of observed connectomes, 
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focusing on the measures: mean clustering coefficient (C), global efficiency {E), 
degree assortativity (Rk), modularity (Q), characteristic path length (L), and 
network diameter {max[D]) (see Appendix for descriptions of these measures). 
We estimated the magnitude of correlation between graph measures made on 
synthetic networks generated by the MI model and the same measures made 
on empirical networks. We found that the MI model did an excellent job re¬ 
producing the rank order of individual participants’ mean clustering coefficient 
(r = 0.90, p « 0), modularity (r = 0.69, p « 10“®), characteristic path length 
(r = 0.86, p « 10“^^), and efficiency (r = 0.64, p « 10“®). Network diameter 
(r = 0.23, p = 0.15) and degree assortativity (r = 0.05, p = 0.74) were not well 
matched (Figure|^). It should be noted that, in general, most graph measures 


are not completely orthogonal to one another (Costa et ah, 2007b). 

While the MI model generally reproduced the rank order of participant-level 
graph measures, it nonetheless systematically over-/under-estimated the values 
of certain measures. For instance, efficiency was, on average, smaller for syn¬ 
thetic networks than for empirical networks (points falling above the diagonal 
in Figure third panel). The same is true for characteristic path length 
(over-estimated). Despite these biases, the discrepancy between empirical and 
synthetic networks for any of these measures was, on average, small - across 
participants, the mean clustering coefficient, modularity, path length, and effi¬ 
ciency scores of synthetic networks were always within 5.5% of the same measure 
made on the corresponding observed network. 


3.3.2. Distance-dependent degree assortativity 

The human connectome features hub regions linked by long distance con¬ 
nections, forming rich clubs and cores. This propensity for higher-degree nodes 
to be linked by longer connections should be reproducible by a good generative 
model. To assess whether this were the case, we extracted and pooled across 
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participants the list of all connections, the degrees of their stubs and ky), and 
length {E{u,v)). From these data, we estimated the three-dimensional cumu¬ 
lative distribution function, F^konkp, E(a, /3)). At any point {fc^, fc/j, i?(a,/?)}, 
the value of F corresponded to the fraction of all connections satisfying ku < ka, 
kv < kp, and E{u,v) < E(a,P) and ky were ordered so that ky < ky). We 
constructed similar distributions for the best-fitting synthetic networks gener¬ 
ated by each model and quantified the discrepancy between distributions with a 
KS statistic. In general, the rank order of models scored by this KS statistic was 
similar to the rank order of their energies (Figure]^). The MI model achieved 
the smallest KS statistic {KS = 0.12 ± 0.01) while the pure geometric model, 
on the other hand, performed the worst {KS = 0.37 ± 0.01). 

3.3.3. Local statistics 

Finally, we tested whether the best-fitting synthetic networks generated by 
the MI model were capable of predicting the degree and clustering coefficient 
sequences of the connectome. We expressed each node’s empirical degree, ky, 
and clustering coefficient, Cy, as z-scores by standardizing the empirical values 
against the distributions obtained from the best-fitting synthetic networks. Z- 
scores were averaged across subjects and used to quantify the discrepancy in 
those measures (larger scores indicated poorer fit). We compared these z-scores 
against scores obtained from the best-htting synthetic networks generated by 
the pure geometric model in order to ascertain whether they represented an 
improvement in fitting local network measures (Figure]^). We found that, on 
average, the MI model produced smaller discrepancies (points below the diag¬ 
onal) compared to the geometric model. Typically, the largest improvements 
were for nodes whose degree or clustering coefficient was mismatched the great¬ 
est by the geometric model. For some nodes, however, the geometric model 
actually outperformed the MI model, though the standardized scores for these 
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nodes were, generally, rather small for both models. 


3.4- Application to human lifespan data 

In addition to quantifying models’ performances, we asked whether the pa¬ 
rameters of the generative models captured meaningful information about in¬ 
dividual differences in network organization. To demonstrate the utility of the 
network modeling approach for characterizing individual variation, we extended 
our analysis to the NKI dataset’s N = 126 participants, spanning a range of 
ages from 7-85 years. With an average network density of 10%, a number of 
individual’s connectomes were fractured into multiple disconnected components 
(71 of the 126 participants). However, the largest connected component, across 
all participants, included 98.5 ± 0.03 percent of all nodes, indicating that in the 
majority of cases the network is divided into two components: one singleton 
node and a component containing all other nodes. We hypothesized that age- 
related changes in network organization may be captured by the parameters of 
the generative models, rj and 7 . We tested this hypothesis by first regressing 
out participants’ intracranial volumes and mean framewise displacement from 
parameter values obtained from the best-fitting MI models and correlating the 
residuals with participant age. We also expressed energies and KS statistics as 
z-scores relative to a generative model in which connections were formed ran¬ 


domly to correct for variations in network density with age (Betzel et al. 2014 


Lim et al. 2015). This null model preserved only the density of connections 


and not degree sequence. The results of these analyses indicated that the value 
of r] decreased in magnitude with age {fage.r) = 0.39, p « 10“®'^), while 7 did 
not exhibit any significant age-related changes {fage,~f = 0.07,p « 0.45), which 
implied that the penalty on long-distance connections weakened with age. We 
also found that E, KSe, and KSc all increased with age (maxp « lO”"^'^) (Fig¬ 
ure]^, indicating that the MI model does an increasingly poor job capturing 
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the organization of older connectomes compared to younger connectomes. 

4. Discussion 

In this report, we tested different classes of generative models for the hu¬ 
man connectome. Our study makes several novel contributions, by quantita¬ 
tively comparing different sets of generative models, by applying these models 
to human connectome data, and by fitting models to networks of individual 
participants. We confirmed that pure geometric models of the form consid¬ 
ered in this report cannot create synthetic networks that were both as clustered 
and also contained the same proportion of long-distance connections as the 
observed human connectome. To identify which additional factors were most 
capable of creating realistic networks we incorporated non-geometric informa¬ 
tion into our generative models’ wiring rules. With this additional degree of 
freedom, the synthetic networks generated by these more complex models more 
accurately reproduced the connectome’s clustering and edge length distribu¬ 
tions. The best-fitting model formed connections on the basis of homophilic 
attraction (matching index) combined with geometric constraints. Importantly, 
synthetic networks generated by this model not only reproduced degree, be¬ 
tweenness centrality, clustering coefficient, and edge length distributions (all 
measures that contributed to the energy function used for optimization), but 
they also reproduced additional graph theoretic properties such as character¬ 
istic path length, mean clustering coefficient, global efficiency, modularity, the 
propensity for high-degree nodes to be connected via long-distance edges, and 
local node statistics such as degree and clustering coefficient sequences. We 
also demonstrated robustness of the matching index model, comparing it across 
three separate datasets totaling N = 380 participants and finding consistent 
results in all cases (See Appendix). As a final demonstration of the utility of 
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generative models, we fit the MI model to connectomes of individuals whose ages 
ranged from 7-85 years, showing that the distance penalty weakened with age 
while energy increased, an effect driven by growing discrepancies in clustering 
and edge length distributions. 

Generative models for brain networks have been investigated before, serving 


as proofs of concept (Kaiser and Hilgetag 2004b Kaiser et al., 2009 Henderson 


and Robinson, 2013 Lim and Kaiser 2015 Roberts et al. 2015) or as investiga¬ 


tive tools for non-human connectome data (Kaiser and Hilgetag, 2007 Costa 


et al. 2007a Nicosia et al. 2013 Langen et al., 2015). One limitation of earlier 


studies was the use of composite connectivity matrices as empirical benchmarks. 


For example, Ercsey-Ravasz et al. (2013) and Song et al. (2014) proposed geo¬ 
metric models of an incomplete macaque connectome, where connections were 
based on composite tract-tracing data compiled across multiple subjects and 
only a subset of cortical areas. Another limitation of earlier work was the lack 
of model comparison. In many cases proposed generative models were only com¬ 


pared against random generative models (Ercsey-Ravasz et al. 2013 Song et al. 


2014) where connections were formed with uniform probability, as opposed to 


models incorporating more plausible generative mechanisms. 

The first model we examined was the pure geometric model, which was the 
simplest but also, in accordance with earlier studies, performed the worst. The 
observation that geometry partly explains the topology of brain networks is in 


line with in a large literature on wiring minimization (Mitchison 1991 Laughlin 


and Sejnowski 2003 Cherniak et al., 2004 Samu et al. 2014), and has been 


appreciated in modeling studies of both human brain networks (Henderson and 


Robinsoiil 2013[ [Kaiser and Hilget^ |2004a Vertes et al.[ |2012| [Klimm et al. 


2014) and those of non-human primates (Kaiser and Hilgetag 2004a Costa 


et al. 2007a) and other mammals ([Henderson and Robinson 2011 Rubinov 










































































































































et al., 20151. While recent work has suggested that regional variation in certain 


topological properties of connectomes such as degree, clustering coefficient, and 
betweenness centrality, can be accounted for based on the geometry of the brain 


(Henderson and Robinson 20141, our findings support the view that strong 
spatial constraints alone are insufficient for explaining all topological aspects of 


brain networks (Kaiser and Hilgetag 2006 Bullmore and Sporns 2012). This 


conclusion stands in contrast to other reports (Ercsey-Ravasz et al., 2013 Song 


et al., 2014) suggesting that geometric models are the sole generative mechanism 


underlying the connectome’s formation and evolution. Instead, we find that in 
order to accurately reproduce the connectome’s topology our models required 
information about node’s pairwise similarity (homophily), which agrees with 


earlier modeling studies of the primate connectome (Costa et al. 2007aI and 


human functional brain networks (Vertes et al. 2012). 

The final component of this report was an application of network modeling to 
human lifespan data, which revealed that geometric constraints weakened while 
energy and the mismatch of clustering and edge length distributions all increased 
with age. Collectively, these results indicate that the MI model is becoming an 
increasingly poor model of the connectome as participants become older. One 
possible explanation is that connectome patterns become increasingly random 
with age, making it impossible for any wiring rule to model the connectome 
precisely. Alternatively, it could also be the case that there are proportionally 


more long-range connections later in life (Lim et al. 2015), and therefore, with 
advancing age, connectomes cannot be reproduced as accurately with a wiring 
rule that shows preference for short-range connections. Indeed, this appears 
to be case; placing each participant’s connections into bins (10 mm width) 
according to connection length and correlating bin counts with age we found 
that bin count was negatively correlated with age up to around 70 mm (Figure 
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S19). For longer connections there was no clear relationship. Future work should 


investigate, in greater detail, the underpinnings of the decrease in geometric 
constraints. 

The aim of this study was not to model the growth and development of the 
human connectome. Doing so would have required a more complicated model 
that included more system-specific detail. Instead, our models were designed to 
reduce a network’s description length. Naively, we can reconstruct a network 
exactly from a list of its nodes and edges. However, such a precise reconstruction 
may not be necessary or even desirable. Oftentimes we are more interested in a 
network’s high-level properties (e.g. modularity, degree distribution, etc.), than 
the exact configuration of its connections. In such a case, a mechanism that 
generates synthetic networks with the approximately the same set of properties 
represents a much more economical (compressed) description of the network. 
Our models are in line with this approach, seeking a parsimonious description 
of the human connectome, wherein its overt complexity gets compressed into a 
model’s wiring rule and parameters. This type of compressed description can 
be used toward any number of ends, including investigation of differences in 
individual participants. For instance, we found that some participants’ connec- 
tomes were compressible (low energy) while others were not (high energy). An 
important question, moving forward, is whether these differences become mean¬ 
ingful when examining individual differences or comparing clinical and control 
populations, or whether they can be related to some behavioral measures across 
both individual and group levels. 

There are a number of methodological considerations that should be dis¬ 
cussed. First, the class of dual-term models left the definition of K{u,v) up 
to the user. For practical reasons, we explored only twelve such rules. Even 
with this limited exploration, we found a great deal of stratification in terms 
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of model performance. This leaves open the possibility that wiring rules not 
explored in this report could produce superior results. For example, geometric 
models of forms not considered here (e.g. “connect all nodes separated by a dis¬ 


tance less than dthreshoid \ models with regional inhomogeneities (see Rubinov 


et al. (2015))), or more accurate modeling of interregional white-matter fiber 


lengths in place of Euclidean distance could possibly lead to improved hts. In 
addition, models of altogether different form could be implemented. For exam¬ 
ple, in our report we make the assumption that the formation of connections 
depends equally on both geometric and topological constraints. An alternative 
approach might be to form one subset of connections on the basis of geometry 
and another set of connections on the basis of topology. While enumerating of 
all possible wiring rules or model variations is impractical, a number of meth¬ 
ods have been proposed that aim to discover wiring rules by evolving models 


themselves (Bailey et al. 2012 Menezes and Roth, 2014), as opposed to propos¬ 
ing a model and htting its parameters, as we did here. These approaches, we 
believe, warrant further attention. In any case, consideration of a broader class 
(or classes) of models represents an important avenue for future work. 

Another methodological consideration concerns the evaluation of a synthetic 


network’s htness. The synthetic networks are mapped into a morphospace (Gohi 


et al. 2013) according to their geometric and topological properties and com¬ 


pared to the observed connectome along the same dimensions. Whether these 
properties are the most appropriate measures for network comparison is unclear. 
In principle, one could dehne alternative energy functions whose minima may 
not coincide with those reported here, and for which the MI model is not the best 
performer. Though the exploration of alternative energy functions is beyond the 
scope of this report, we attempted to mitigate the concern that our choice of 
energy function biased our results by performing a series of additional tests, the 
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results of which indicated that the MI model consistently outperformed other 
models. 

Another consideration relates to the combination of diffusion imaging and 
tractography for inferring the connectome’s organization. Though diffusion 
imaging/tractography represents the state of the art for in vivo reconstruc¬ 
tion of the brain’s anatomical connections, these technologies are nonetheless 


prone to false positives and negatives (Thomas et al. 20141, which could poten¬ 
tially affect our results. While the use of multiple atlases, independent datasets, 
and alternative processing streams help reduce the bias of any single processing 
strategy they do not completely address the issue. The shortcomings of diffusion 
imaging and tractography, while presently limiting, also serve to highlight the 
need to development new non-invasive methods for mapping the human brain. 

A final consideration is related to the size of networks, the definition of nodes, 
and the scalability of our models. In general, how one defines a network’s nodes 


has implications for the network properties of the resulting graph (Zalesky et al. 


20101. It is likely that the size and number of nodes factor into the performance 


of the models studied here. The networks analyzed in this report consisted of 
either n = 74 or n = 108 nodes, representing two different parcellations of the 
cortex. However, it is becoming increasingly common to model brain networks 
with up to thousands of nodes. Because the number of possible positions to 
place an edge grows as 0{n^), the space of all networks that the model could 
generate becomes much larger as n increases. Models with n » 10^ may 
require stronger parametric constraints (e.g. larger magnitudes for rj or 7 ) or 
incorporating additional topological information (and an additional parameter) 
into a model’s wiring rule. In general, the choice of how to define a network’s 
nodes and at what scale the human connectome is best described is unclear, 
though future work on data-driven parcellations will surely help address this 
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issue. 


5. Appendix 

The main text describes the results of generative models applied to a dataset 
of 40 participants scanned at CHUV. In this appendix we demonstrate the ro¬ 
bustness of those results by reproducing the principal findings using alternative 
datasets. The additional datasets are described, briefly, below and in more de¬ 
tail later in this appendix. Figures S1-S9 shows model energies for each of the 
additional datasets, reproducing Figure 2 from the main text. 

1. Two replication datasets (HCP and NKI) of A = 214 and N = 126 
participants, respectively. 

2. The same CHUV dataset with different levels of network density (5% and 
15%) and defined using an alternative weighting. 

3. CHUV dataset including both left/right cerebral hemispheres. 

4. Composite (i.e. group averaged) CHUV, HCP, and NKI connectomes. 

5. Composite CHUV dataset using fiber length in place of Euclidean distance. 

6 . Composite CHUV dataset using an exponential function to model geo¬ 
metric constraints in place of the power-law function. 

7. Composite CHUV dataset using a finer cortical parcellation (n = 223 
nodes.) 

Each additional dataset is accompanied by a figure showing the energy dis¬ 
tribution for the 100 best-fitting synthetic networks for each model type. At the 
end of this appendix we have also included a glossary of graph theoretic terms 
that appear throughout the main text. 
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Additional Datasets 


5.1. Human connectome project (HCP) - See Figure SI 

The HCP data were drawn from the 215 participants made available as part 


of the Q3 release of the human connectome project (Van Essen et ah, 2012 


Glasser et al. 2013). From each participant’s diffusion-weighted MR images 


(diffusion tensor imaging; DTI), white matter fibers were reconstructed from 
generalized q-sampling (Yeh et ah, 2010| ) (GQI: allowing for the reconstruction 
of crossing fibers) and streamline tractography and the cortex was parcellated 
into 219 parcels based on a subdivision of FreeSurfers’s Desikan-Killiany atlas 


(Cammoun et al. 2012). More details on the processing of these data can be 


found elsewhere (de Reus and van den Heuvel 2014). We focused on the right 


hemisphere only, which consisted of n = 108 regions. We imposed a threshold 
on streamline counts of 5 (i.e. a minimum of five streamlines must be present for 
us to consider two regions linked by a binary connection) in order to maintain an 
average connectome density of p « 10% across subjects. We excluded a single 
subject on the grounds that their total streamline count was greater than two 
standard deviations from the group mean, leading to a final dataset of A = 214 
participants. 

5.2. Nathan Kline Institute, Rockland, NY (NKI) - See Figure S2 

The NKI dataset consists of A = 126 participants whose ages ranged from 


7-85 years (Nooner et al.[ 20121. Tractography was performed using the Gonnec- 


tome Gomputation System (GGS: http://lfcd.psych.ac.cn/ccs.html). A more de¬ 


tailed description of the processing pipeline was included in other reports (Betzel 


et al. 2014 Gao et al. 2014 Yang et al. 2014). Unlike the HGP and GHUV 


datasets, the cortex was parcellated into 148 regions according to the Destrieux 


atlas (Destrieux et al. 2010). We analyzed a single hemisphere (n = 74 regions). 
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but instead of focusing on either the right or left, we formed a composite matrix 
by combining the streamline counts between homotopic pairs of regions. We, 
again, enforced a mean density of p « 10% by selecting a streamline threshold 
of 30 streamlines. 


5.3. Alternative CHUV datasets - See Figures S3-S6 

We investigated four variants of the CHUV dataset. In the main text we 
analyzed binary connectivity matrices (average density of p « 10%) by applying 
a threshold to streamline counts. The first two variants were constructed in the 
same manner but with the threshold level chosen to maintain average densities 
of p « 5% and p « 15%. The third variant retained a threshold of p « 10% 
but instead of thresholding streamline counts we thresholded ’’fiber density” 
matrices. The fiber density between nodes u and v is common choice for edge 
weights in weighted anatomical brain networks, and is defined as the number 


of streamlines divided by the sum of u and u’s surface areas (Hagmann et al. 


2008 Betzel et al. 2013 Goni et al. 20141. The fourth variant was constructed 


by thresholding streamline counts to p « 10% but included both left and right 
cerebral hemispheres. 


5 . 4 . Group-average matrices - See Figures S7-S9 

In addition to single-participant modeling, we analyzed group-average con¬ 
nectivity matrices for all three datasets (CHUV, HCP, and NKI). Group-average 
matrices boost the ratio of signal to noise by emphasizing connections that are 
consistently expressed across subjects, thereby rendering the human connec- 
tome more clearly. The de facto method for generating group-average matrices 
is to retain the supra-threshold elements of the [n x n] consistency matrix, C, 
whose element indicates the fraction of all participants in which a connec¬ 
tion was present between nodes u and v. The resulting matrix, however, over¬ 
expresses short-range connections, as short-range connections are more easily 
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reconstructed and are hence the most consistent connections across subjects 
whereas long-range connections are more prone to error. Also, this method 
forces a user to choose, somewhat ad hoc, the threshold for including a con¬ 
nection in the group-average matrix. Instead, we use an alternative method for 
generating a group-average connectomes whose edge-length distribution matches 


that of the typical single-participant distribution (Misic et ah, 20151. Briefly, 
this method begins by first estimating the average number of connections of a 
given length in a typical participant’s connectome. Next, all pairs of nodes sep¬ 
arated by a comparable distance are identified and, from among this subset, the 
most consistent connections are added to the group-average connectivity ma¬ 
trix. Repeating this process for all distances yields a representative connectome 
that matches, almost exactly, the typical edge length distribution, but features 
only the most consistently expressed edges at each connection length. 


5.5. CHUV Group-average matrix with fiber length - See Figures SlO-Sll 

In this report, we test the hypothesis that the human connectome emerges 
as a consequence of both topological and spatial constraints, which we model as 
power-law functions. In doing so, we assume that the material/metabolic cost 
of fiber tracts can be equated to Euclidean distance separating its endpoints, 
rather than the actual integrated length of the curved tract. The argument 
for doing so is threefold. First, estimates of fiber length can only be obtained 
for completed streamlines, meaning that no estimates exist for connections that 
were absent in the observed tractography data. In order to fill in the missing 
fiber lengths, one can resort to fiber interpolation (i.e. using the distance/fiber 
length relationship of existing connections to estimate the fiber length of missing 
connections), which necessarily introduces an additional source of uncertainty. 
Second, the relationship of fiber length and Euclidean distance is rather strong 
across our datasets: the amount of variance in fiber length accounted for by 
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Euclidean distance was 66%, 32%, and 79% for the CHUV, NKI, and HCP 
datasets, respectively. Lastly, a recent study used both Euclidean distance and 
the measured length of axons in a geometric generative model of the mouse 
connectome 


Rubinov et al. (2015), finding no change in their results. For these 


reasons, we assert that Euclidean distance, though imperfect, is a reasonable 
proxy for the cost of forming a connection. 

Nonetheless, we felt it prudent to test the effect of using fiber length in place 
of Euclidean distance in our models. To do so we first constructed an interpo¬ 
lated fiber length matrix. The elements of this matrix contain fiber length esti¬ 
mates for the hypothetical tracts linking the pair of nodes, u and v. To obtain 
such estimates for nodes u and v, we calculate the Euclidean distance, E(u,v) 
between their respective centroids. We then consider all geometric neighbors of 
u and all geometric neighbors of u, where a geometric neighbor of u is any other 
node whose centroid is less than E{u, v) x t away from that of u. Here, we use 
a fixed value r = 0.2. The fiber length of the hypothetical connection between 
nodes u and v is set equal to the mean fiber length of connections between u 
and any of u’s geometric neighbors and v and any of v’s geometric neighbors. 
If no connections exist between these subsets of nodes, then we used the /3 coef¬ 
ficients from the Euclidean distance versus hber length linear regression model 
to estimate a fiber length. 


5.6. CHUV Group-average matrix with exponential rule - See Figure S12 

In the main text we modeled the geometric wiring rule as a power-law func¬ 
tion. However, recent work has suggested that an exponential function better 


captures the relationship between edge length and connection probability (Hen¬ 


derson and Robinson, 2014). To this end, we replaced the geometric power-law 


function in our geometric models with an exponential function and re-ran our 
analyses. 


25 









5.7. CHUV Group-average matrix with finer cortical parcellation - See Figure 


SIS 


The pipeline described in Cammoun et al. (2012) makes it possible to parti¬ 
tion the cerebral cortex into parcels at several different scales or resolutions. In 
our main analysis, we used an intermediate scale (n = 219 cortical parcels, with 
n = 108 parcels in the right hemisphere). In this section, we repeat our analysis 
for a composite group-average CHUV matrix generated at the next finest scale, 
which has n = 223 cortical parcels in the right hemisphere. The group-average 
matrix was generated using the same procedure as described earlier. 


5.8. Graph theory 

In the main text we characterize networks using a number of different graph- 


theoretic measures. Here we describe those measures in some detail (Rubinov 


and Sporns 2010). 


• Adjacency matrix: A topology of an n-node network can be described by 
the matrix A = [a„„]. The elements are equal to 1 if nodes u and v 
are connected and are otherwise equal to 0. 


• Node degree: A node’s degree counts the total number of connections that 
node makes. In an undirected network (i.e. Uuv = cLvu) a node’s incoming 
and outgoing degrees are equivalent, and can be calculated as the sum 
across rows or columns of A, i.e. ku = 

• Network density: A network’s density, p, is equal to the fraction of existing 

connections out of the total number of possible connections. If the total 
number of connections is equal to 2m = ku, then network density is 

equal to p= 
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Degree assortativity: Degree assortativity measures the extent to which 
nodes of similar degree connect to one another. It is usually operational¬ 
ized as a correlation measure, i?fc, which measures the Pearson correlation 


of the stub degrees of all edges (Newman 20021. 


Clustering coefficient'. A node’s clustering coefficient measures the density 
of a node’s neighborhood. Phrased differently, clustering coefficient it 
measures the fraction of a nodes’ neighbors that are connected to one 
another. If = | 'Yhvw '^uvO'uw'^vw is the number of triangles (connected 
neighbors) surrounding node u, then u’s clustering coefficient is equal to 
c„ = (k"-!) ■ mean clustering coefficient of a network is simply the 

average of Cu over all possible u. 


Characteristic path length'. The shortest path matrix, D = [duv], mea¬ 
sures the length of the shortest paths between all pairs of u and v. The 
characteristic path length is the average length of all shortest paths and 
is calculated as L = 

Network diameter'. A network’s diameter is the longest shortest path be¬ 
tween any two nodes, and is calculated as max{duv)- 


Clobal efficiency. A network’s efficiency is closely related to characteristic 
path length. Rather than calculating the average length of a shortest 
path, efficiency is calculated as the average length of Specifically, a 
network’s efficiency is calculated as E = n(n-i) ' 


Modularity. Many network measures describe a network’s organization 
at the level of individual nodes or with a global summary statistic. Al¬ 
ternatively, it is possible to describe a network’s ’’large-scale structure” 


(Newman, 20121 - i.e. its organization at an intermediate scale. Perhaps 


the most common type of large-scale structure is known as a network’s 











community structure or a division of a network into internally dense and 


externally sparse modules (Fortunate, 2010 Sporns and Betzel 2015). 


The most popular method for identifying a network’s communities and 


evaluating their fitness is to maximize a ’’modularity” function (Newman 


and Girvan 2004): 


Q — ^ Puv\^(,9ui 9v) 


( 4 ) 


In this expression, S {1,..., iF} is the community to which node u is 
assigned, S{gu,9v), is the Kronecker delta function and is equal to unity 
when gu = gv, and PuV indicates the expected number of connections 
between u and v under a particular null model (typically , which 

is the expected weight under the null model where each node’s degree is 
preserved but connections are otherwise made randomly). In general, 
high quality modules produce large values of Q. To maximize Q, then, 
one needs to ensure that connections satisfying (uuv —Puv) > 0 fall within 
communities. The process of modularity maximization is computationally 
intractable for all but the most trivial cases, though many heuristics are 
available for identifying near-optimal modules. Here, we use the Louvain 


algorithm (Blondel et al. 2008) to produce 100 near-optimal estimates of 
modules. 
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Figure 1: Summary of the geometric model: (A) observed (black) and synthetic networks 
generated at different points in parameter space. (B) Energy landscape showing the behavior 
of KSe^ KSc, and energy as a function of r]. The dashed vertical lines indicate the param¬ 
eter values at which the example synthetic networks were generated. (C) Distribution of 
rj parameter of top 1% lowest-energy synthetic networks aggregated across all participants. 
(D) Cumulative distributions of degree (orange), clustering coefficient (green), betweenness 
centrality (yellow), and edge length (purple) for observed connectome (darker line) and best¬ 
fitting synthetic networks (lighter lines) for a representative participant. 
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Figure 2: Energy distributions across all models. Each box plot represents the top 1% lowest 
energy synthetic networks generated by each model and aggregated across all participants. 
The color of each plot indicates the general class of the model: Homophily is shown in blue, 
clustering in pink, degree in green, and geometric in purple. The specific wiring rule names 
are shown along the x-axis. 
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Figure 3: Matching Index Model: (A) observed (black) and synthetic networks generated at 
different points in parameter space. (B) Energy landscape showing the points at which the 
example synthetic networks were generated. (C) Distribution of rj and 7 parameters of best¬ 
fitting synthetic networks aggregated across all participants. (D) Tradeoff between rj and 7 . 
Each point represents the mean parameter values for an individual participant. Participants 
with larger values of rj tend to have the smallest magnitude 7 and vice versa. (E) KS statistic 
landscapes for degree (orange), clustering (green), betweenness (yellow), and edge length 
(purple) for observed connectome and best-fitting synthetic networks for a single participant. 
(F) Cumulative distributions of degree (orange), clustering (green), betweenness (yellow), and 
edge length (purple) for observed connectome (darker line) and best-fitting synthetic networks 
(lighter lines) for a representative participant. 
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Figure 4: Cross validation of the matching index model: (A) Comparison of matching index 
model and observed connectomes in terms of the graph-theoretic measures mean clustering 
coefficient, modularity, global efficiency, and characteristic path length. (B) Comparison of all 
models in terms of reproducing the distance-dependent degree assortativity (i.e. the propensity 
for high degree nodes to be linked by long-distance connections). (C) Discrepancies in degree 
and clustering coefficient sequences of synthetic networks generated by the matching index 
model and pure geometric model. 



Figure 5: Changes in model parameters and energy components across the lifespan: (A) The 
geometric parameter, r] weakens with age. (B) The average energy of each participant’s best¬ 
fitting synthetic networks (z-scored against an ensemble of synthetic networks generated using 
a uniform wiring rule) also increases with age. (C, D) KSe and KSc increase with age, and 
these increases collectively drive the increase in energy. 
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Figure SI: Model energies for HCP dataset. 
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Figure S2: Model energies for NKI dataset. 
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Figure S3: Model energies for CHUV dataset with p ^ 5%. 
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Figure S4: Model energies for CHUV dataset with p ^ 15%. 
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Figure S5: Model energies for CHUV dataset with p ^ 10% and edge presence/absence 
determined by fiber density weights rather than streamline/fiber tract counts. 
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Figure S6: Model energies for CHUV dataset with p 5^ 10% but for entire cerebral cortex. 
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Figure S7: Model energies for CHUV composite connectivity matrix. 
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Figure S8: Model energies for HCP composite connectivity matrix. 
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Figure S9: Model energies for NKI composite connectivity matrix. 
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Figure SIO: Relationship between Euclidean distance and fiber length. (A) Scatter plot 
of Euclidean distance and fiber length for connections across all participants in the CHUV 
dataset. (B) The group-average Euclidean distance matrix obtained as the average distance 
between centroids across all participants. (C) Interpolated fiber length matrix, generated 
by the procedure described in the Appendix. (D) Scatter plot of group-average Euclidean 
distance versus interpolated fiber length. The red line in panel D is the identity line. Note 
that most connections have longer fiber length than Euclidean distance. However, a small 
number of connections have shorter fiber lengths. This occurs for connections that originate 
and terminate near the boundary of two parcels. In this case, the fiber length can be very 
short, while the Euclidean distance between the parcels can be long. 
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Figure Sll: Model energies for CHUV composite connectivity matrix using the interpolated 
fiber length matrix in place of Euclidean distance. 
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Figure S12: Model energies for CHUV composite connectivity matrix using an exponential 
function in place of the power-law function for the geometric wiring rule. 
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Figure S13: Model energies for CHUV composite connectivity matrix with a higher-resolution 
partition (n = 455 cortical nodes; n = 223 cortical nodes in the right hemisphere). 
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Figure S14: Detailed explanation of how the generative model works. Panel A shows a toy 
network, with nodes embedded along the perimeter of the unit circle. Given this network and 
our generative models, we can ask the following question: If we wish to add a new edge to 
the network according to the wiring rule: P(u., v) = K{u, x D{u, v)'^, where will that edge 
most likely go? To answer this question, we need to first calculate the distance between all 
pairs of nodes (panel B), whose elements we raise to the power r) = —1 (panel C). The other 
component we need is the matrix, K{UyV), which represents the non-geometric component. 
One possible definition of K{Uy v) is the product of node degrees (the deg. prod model). Given 
this particular definition, we set K{u,v) = kuX kv. To generate this matrix, we first calculate 
ku for all u (panel D). Then we multiply ku x kv for all pairs of nodes, {u, v} (panel E). We 
next raise K{u, v) to the power 7 = 1 , which we show in panel F. We perform the element-wise 
multiplication of K(u,v)^ by D(u,v)'^ (panel G, left). Finally, we have to remove the pairs, 
{u, r?}, for which a connection already exists (the gray cells in panel G, right). The nonzero 
elements of this matrix give us the relative probabilities of where an edge would get placed 
given this wiring rule. After placing the edge according to these probability, the model would 
return to panel A and the process would repeat itself. 
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Figure S15: We show the observed degree distribution (black bar plot) against the degree 
distributions of the best-fitting synthetic networks generated with each model. In all panels, 
the x-axis indicates degree (fc) and the y-axis indicates frequency. This figure shows data from 
a single representative subject in the CHUV cohort. 
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Figure S16: We show the observed clustering coefficient distribution (black bar plot) against 
the clustering coefficient distributions of the best-fitting synthetic networks generated with 
each model. In all panels, the x-axis indicates clustering coefficient (c) and the y-axis indicates 
frequency. This figure shows data from a single representative subject in the CHUV cohort. 
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Figure S17: We show the observed betweenness centrality distribution (black bar plot) against 
the betweenness centrality distributions of the best-fitting synthetic networks generated with 
each model. In all panels, the x-axis indicates betweenness centrality (6) and the y-axis 
indicates frequency. It should be noted, that due the orders of magnitude difference between 
the most central and least central nodes, the x-axis has been log-transformed. This figure 
shows data from a single representative subject in the CHUV cohort. 
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Figure S18: We show the observed edge length distribution (black bar plot) against the edge 
length distributions of the best-fitting synthetic networks generated with each model. In all 
panels, the x-axis indicates edge length (e) and the y-axis indicates frequency. This figure 
shows data from a single representative subject in the CHUV cohort. Note that of the models 
shown here, ” matching” and ” neighbors” do a reasonable job generating networks with a broad 
range of connection lengths. This is in stark contrast with the geometric model ”geom,” which 
over-represents short range connections. 
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Figure S19: Correlation of age with number of connections of particular lengths. The height 
of each bar represents the magnitude of the Pearson’s correlation coefficient. The bins with 
an asterisk exhibit statistically significant change (p j 0.05, uncorrected). 
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